\section{Future Work}
\label{futurework}

A natural continuation of this project would require us to make a number of improvements.
For the Q-learning system, currently, the drivers are punished by increasing the cost of the edge between two nodes in the graph by a fixed amount. We could instead design the Q-values such that they directly incorporate these values. This would make punishment
propagation automatic. Moreover, we need to build better reward functions for both the EWA and Q-learning algorithms. By better we mean a function that truly portrays the preference of the agents for a given state or outcome. This work would benefit the most by this improvement.  Further work in the learning of the parameters for the algorithms would greatly impact the quality of this work as well. The addition of a graphical user interface for running simulations could also prove useful in inspecting the choices made by agents.

Another important improvement could be made in the model of our game. For the EWA algorithm, building experience models that allow the driver agent to quickly react to the presence or absence of a police driver in the agents current road segment are needed. Such a task could be accomplished by building a model that takes time into account in evaluating the individual history of a strategy. Through this method, the decay in experience for the agents is impacted by the transient nature of the game. In addition, more experiments, including a larger set of metrics, and more rigorously analyzing the equilibrium strategies would benefit our work.

Additionally, a solution to the problem of multiple police agents choosing the same node, a problem inherent to the multi-agent nature of the game, must be found. The cause of this problem is that the node appears as the most attractive option to all agents. This is possible due to the "fly over" effect of the police agent; they do not need to use distance to navigate the graph. One possible fix to this problem is to constrain the agents movement to the best node closest to them. This method would help to enforce spatial locality.

Lastly, we would like to see this work extended more analytically with a deeper exploration of the work done in EWA. We would like to investigate the current structure of the EWA formulation, and hopefully discover substructures that could be used to build dynamic programming solutions for the problem. The main hope being that the attraction calculations could be decomposed such that optimal substructures could be found to encourage a global solution to the problem. 
